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Abstract. Volunteered geographic information (VGI) are constantly being 
added, edited or removed by the users, so their quality is not static. As VGI 
users do not necessarily have high spatial knowledge, system administra- 
tors control the quality of i nformation i n order to provide the users with the 
appropriate datasets. However, quality of spatial data has several parame- 
ters, so the appropriate data may differ from one application to another. 
Unlike the geographic communities, presenting the standard metadata 
statements is not so efficient, as they may not be familiar for VGI users. I n 
this paper, we propose providing VGI users with the spatial data quality 
parameters through simple cartographic representations and let them de- 
cide on appropriateness of the datasets for the application at hand. The us- 
ers select the desi red qual ity parameters as wel I as the visual izati on element 
(e.g. color, line thickness, intensity, style, etc.) to classify the datasets. The 
datasets are represented by the selected element based on their metadata 
information, which help the users to visually evaluate the quality of da- 
tasets. 

Keywords: Volunteered geographic information (VGI), spatial data quali- 
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1. Introduction 

Recent advances in spatial data collection technologies and online services 
have dramatically increased the contribution of ordinary people to produce, 



share and use geographic information. A growing number of cell phones, 
digital cameras, PDAs and other hand-held devices are equipped with 
georeferenced data collection technologies, made it possible for ordinary 
people to collect spatial data, which are then shared and disseminated on 
the internet using the web map services. It has led to a huge source of spa- 
tial data termed as Volunteered Geographic Information (VGI) by Mike 
Goodchild (Goodchild 2007). There are several VGI environments, such as 
OpenStreetM ap (OSM), Wikimapia, etc. whose data are provided by the 
users. 

Volunteered geographic information are constantly being added, edited or 
removed by the users. Like other crowd-source data environment (e.g., 
Wikipedia), existing data can be improved by users. Thus, quality of volun- 
teered geographic information is not static. As VGI users do not necessarily 
have high spatial knowledge, they cannot decide on quality of available da- 
tasets. Therefore, system administrators control the quality of information 
in order to provide the users with theappropriatedatasets. However, quali- 
ty of spatial data has several parameters such as spatial accuracy, attribute 
accuracy, completeness, logical consistency and updateness. Therefore, the 
appropriate data may differ from one application to another depending on 
the quality parameters that are important for the given application. For 
example, for route planning, a more complete dataset with less spatial accu- 
racy may be more relevant than an incomplete dataset with high spatial 
accuracy. It is very common in geographic communities to evaluate the rel- 
evancy of datasets for an application based on metadata, which expresses 
different aspects of quality of datasets. I n case of VGI , however, the users 
are not experts and do not necessarily have enough spatial knowledge to 
communicate with the standard metadata statements. 

In this paper, we propose providing the VGI users with the spatial data 
quality parameters through simple cartographic representations and let 
them decide on evaluating the appropriateness of datasets for the applica- 
tion at hand. The users select the desired quality parameters as well as the 
visualization element (e.g. color, line thickness, intensity, style, etc.) to clas- 
sify the datasets. The datasets are represented by the selected element 
based on their metadata information, which helps the users to visually eva- 
I uate the qual ity of datasets. 

The rest of the paper is structured as follows: Section 2 contains an over- 
view of quality issues in VGI conducted in recent years. Approaches for 
quality assurance, quality assessment and quality representation in VGI are 
discussed in this regard. The proposed approach for representing spatial 
data quality parameters in VGI is described in Section 3 and is implement- 



ed for a case study in Section 4. Finally, Section 5 concludes the paper and 
proposes ideas for future work. 



2. Quality Issues in VGI 

As the amount variety and usage of spatial data is increasing spatial data 
quality is getting more attention (Ather 2009). "Unlike the geographic in- 
formation produced by mapping agencies and corporations, VGI carries no 
guarantees of accuracy" (Goodchild 2009a), so their quality and reliability 
is questionable. 

The risks of using poor quality VGI are primarily the same as the risks of 
using poor quality data from an official or commercial supplier - the source 
of the data will not affect the results of using the data. The key difference 
might be that an official agency or commercial vendor could possibly be 
held legally accountable for their data, though in practice, this hardly ever 
happens because of disci aimers of liability (Cooper et al. 2012). 

As VGI is mostly based on human experience of geography, deploying per- 
ception-based parameters to express their spatial quality is more efficient 
than measurement- based parameters used in case of official spatial data. 
Navratil (2009) proposed expressing quality of spatial data with possibility 
distributions instead of precise numbers. De Longueville et al. (2010) in- 
troduced the concept of degree of truth to describe object models with 
vagueness. I nstead of evaluating an object to have a given characteristic, it 
is expressed with the degree of truth that an object tends to have this char- 
acteristic (De Longueville et al. 2010). Flanagin and Metzger believe that 
credibility, as a perceptual variable, is adequate for evaluating collaborative 
productions. "Although there is no clear definition of credibility, it is gener- 
ally thought to be the believability of a source or message, which is com- 
posed of two primary dimensions: trustworthiness and expertise" (Flanagin 
& Metzger 2008). 

Exel expressed that according to the users experience and local knowledge, 
the reputation of users of VGI websites can be assessed (Exel et al. 2010). 
Wikimapia has ranked its users according to the level of experience, num- 
ber of edits, and number of objects, pictures, etc. they have uploaded. For 
example, a new comer is ranked as lower level (level zero), which means he 
can only add objects, but do not have the right to delete any object on the 
map. As he gains more experience in mapping, his level raises providing 
him with more authority. 

Generally, data quality management has two major components: (1) Quali- 
ty assurance that controls the quality of data during the data creation; and 



(2) Quality assessment that evaluates the quality of the produced data, 
whose result is organized in theform of metadata (Goodchild & Li 2012). 

2.1. Quality Assurance in VGI 

Goodchild presented three approaches for assuring the quality of VGI 
(Goodchild & Li 2012): 

• Crowd-sou rcing approach: I nformation provided by a group of 
people tends to be more accurate than by a single individual. This 
approach has been deployed by OSM in an online service called 
"potlatch" where users can modify the existing data. Furthermore, 
OSM users can mark the detected errors in OpenStreetBugs to be 
modified by other users. This approach may not be so useful for ge- 
ographic domain because there will be many spatial errors despite 
of many volunteers in an area. 

• Social approach: Gate-keepers, who are the administrators or 
high level users, check new data in order to avoid gross errors, van- 
dalism, etc. In OSM, the Data Working Group (DWG) solves the 
problems such as conflicts in data provided by different users, as 
well as probable vandal ism and violation. In Wikimapia, the vandals 
found by the low-level users are introduced to the gate-keepers, who 
have the rights to limit or even block their activities. 

• Geographic approach: This approach has been specifically de- 
signed for geographic data and could be automated. It suggests 
checking the geographic data according to some rules. For example, 
the data close to each other must be consistent. This is also true 
when correlating datasets, e.g., water flows downhill. The 
keepright.at website automatically detects some of the errors exist 
in the OSM data and highlight them by different symbols to be cor- 
rected by the users. 

2.2. Quality Assessment in VGI 

Having produced the data, their quality is assessed by using metadata or 
through comparing it with a reference data. 

2.2.1. Metadata 

The vagueness of crowd-source data could be determined by two types of 
metadata (De Longuevi I leetal. 2010): 



• User-encoded vagueness metadata: The user may contribute 
in giving more information about the collected data, although it is 
not fully reliable (Goodchild 2008). For example, the user may spec- 
ify the spatial resolution of the image from which a dataset has been 
digitized. 

• System-created vagueness metadata: The system itself store 
some parameters related to the quality of data, like the scale in 
which the data has been added. 



Despite the importance of metadata for the VGI users, almost no metadata 
exists for projects such as OpenStreetM ap and Google Earth (Goodchild 
2009b). 

On the other hand, the existing metadata standards may not be relevant for 
collaborative volunteered data. In geospatial web, relative quality of da- 
tasets that are being integrated is very important. Goodchild introduced a 
binary user-centric metadata, called metadata 2.0. 1 n addition to the single 
quality of the data, metadata 2.0 describes the ability of two datasets to 
work together (Goodchild 2008). 

2.2.2. Comparison 

A common approach to measure the spatial quality is comparing the data 
with a reference. The main issue here is choosing a proper reference dataset 
(Haklay 2010), especially in VGI where no reference dataset may be availa- 
ble at all (Ciepluch et al. 20 11). 

Goodchild and Hunter (1997) proposed a method for evaluating the posi- 
tional accuracy of a dataset comparing with a reference. A buffer with a cer- 
tain width is created around the reference objects; and the proportion of the 
tested dataset lies within the buffers is calculated (Figure 1). The level of 
accuracy of the tested dataset depends on the size of the buffer chosen 
(Goodchild & Hunter 1997). This method has been used to compare OSM 
data with OS (Ather 2009) and HMGS (Kounadi 2009). 

2.3. Quality Presentation 

Having evaluated the quality of data, this information is presented to the 
users in order to help them to assess the fitness of the data for their use. 
Data producers often provide metadata describing different aspects of the 
quality of the datasets. A metadata record is a file of information, which 
captures the basic characteristics of a data or information resource. 

I n VGI , several people with different knowledge and expertise contribute in 
acquiring huge amounts of data. It results in datasets with different charac- 



teristics. Furthermore, VGI users do not necessarily have high spatial 
knowledge to evaluate the quality of datasets. Therefore, system administ- 
rators control the quality of information in order to provide the users with 
the appropriate datasets. However, quality of spatial data has several pa- 
rameters, so the appropriate data may differ from one application to anoth- 
er. It is a situation where different users prefer different datasets, but they 
are not expert enough to select it based on technical metadata statements. 

Currently, produced metadata has limitations in term of communications 
media for non-expert users and expert users (De Longueville et al. 2010). 
Metadata often contains technical descriptions and terminologies interpret- 
ed by experts. Therefore, it is not so efficient for non-expert users of VGI 
(Devillersetal. 2002). 

There have been ideas proposed for presenting the quality information to 
the users of crowd-source environments through familiar concepts. For 
instance, Wikipedia designed an extension called WikiTrust to visually pre- 
sent the quality of its articles to the users. Contents of the articles are ana- 
lyzed based on their stability, creditability, history, etc., whose result is 
demonstrated in the color intensity of the background. This idea is adapted 
in the next section to present the quality information to VGI users. 




Figure 1 Buffer method to evaluate the positional accuracy of a dataset comparing 
with a reference (Goodchild & H unter 1997) 



3. Proposed Approach to Represent Spatial Data Qual- 
ity Information in VGI 

In this section, we adapt the idea used in WikiTrust to provide the non- 
expert VGI users with the quality information of spatial data, in order to 
help them evaluating the datasets for the application at hand. Having de- 
termined the quality parameters of the datasets, the features are demon- 
strated by a certain visual element so that the user has an understanding of 
their quality. The visual elements can be used as follows (Figure 2): 

• Color classification: The datasets are classified to different quality 
classes. Then, the datasets of each class are shown in different col- 
ors. For example, the datasets with the highest, medium and the 
lowest quality are drawn in, respectively, green, yellow and red. This 
can be used for point, line and polygon feature datasets. 

• Color intensity: The datasets are ordered based on thei r qual ity and 
they are shown by different color intensity. For example, the da- 
tasets with the highest and the lowest quality are drawn in, respec- 
tively, dark blue and light blue. This can be used for point, line and 
polygon feature datasets. 

• Feature Size: The datasets are ordered based on their quality and 
they are symbolized by different size. For point features, it means 
different symbol size; and for line features, it is different line thick- 
ness. For polygon features, it can be adapted as differentiation in 
hatching intensity. 



4. Implementation 

This section describes the results of an implementation developed based on 
the proposed approach. First, data collection process is introduced. Quality 
assessment and presentation of the collected data are presented afterwards. 

4.1. Data Collection 

Ten planimetric maps were produced for a small area, shown in Figure 3, 
using different data collection methods: walking, metering, GPS marking, 
GPS tracking, digitizing and surveying using total station (Figures 4 and 5). 
In order to have datasets with different spatial qualities (limited here to 
positional accuracy and completeness), the data collection was performed 
by different users. 
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Figure 2, The proposed approach for representation of spatial quality information 

to the VG I user 




Figure 3. The study area 
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Figure 4. Examples of the produced maps from the study area 




Figure 5. Overlay of the ten datasets collected from the study area 



4.2. Quality Assessment 

For each datasets, positional accuracy and completeness were assessed as 
follows: 

• Positional accuracy: Si nee there i s no reference data to assess the 
positional accuracy of the datasets, we obtained a relative positional 
accuracy for each one: First an initial positional accuracy was assig- 
ned to each dataset depending on the data collection method and 
the instruments used. This initial value is considered as the weight 
where a weighted average coordinate were computed for each point 
using its coordinate in all of the datasets in which the point has 
appeared. For each point in each dataset, we calculated its deviation 
from the average. Finally, the average of all the deviations calculated 
for each dataset is assigned to that dataset as its positional accuracy. 

• Completeness: Again, we calculated a relative completeness pa- 
rameter for the datasets: The union of all points appeared in all of 
the datasets was supposed to be the complete data (we assumed no 
straight line is split into several segments in any of the datasets). Di- 
viding the number of points of each dataset to all points yields its 
completeness. 

4.3. Quality Presentation 

An ArcGIS extension was developed to visually represent the quality infor- 
mation (i.e., positional accuracy and completeness) assessed for each da- 
taset to the user. The user selects a number of datasets as well as the de- 
sired quality parameter (positional accuracy or completeness). In case of 
color intensity, a base color is selected by the user; then the desired quality 
parameter of the selected datasets are distri buted over the gray scale of to 
255. I n case of using symbol size, the minimum and maximum symbol size 
are set by the user (Table 1); the symbol size of the selected datasets are 
distri buted over this range accordi ng to the selected qual ity parameter. 



5. Conclusion and Future Work 

This paper presents the important issues related to quality management in 
Volunteered geographic information (VGI). Quality assurance, assessment 
and representation of VGI were discussed in this regard. Especially, we 
propose providing the users with the spatial quality information through 
visual elements. As VGI users do not necessarily have high spatial 
knowledge, this approach helps them to evaluate and compare the available 
dataset based on quality parameters i mportant in his current application. 



Here, we focused on positional accuracy and completeness as the quality 
parameters. However, there may be other parameters important for the 
users, eg., updateness or logical consistency. On the other hand, our im- 
plementation can classify the datasets based on only one quality parameter. 
Its extension to simultaneously classify the datasets regarding a combina- 
tion of quality parameters is a future direction of the research. 



Table 1 Visual representation of spatial quality parameters of the datasets by col- 
or, line thickness and both 
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